<verb> [modifiers] <subject>
for example:
ls - list directory (current by default)ls dir - list directory dirls -lah - list directory dir with details, one item per line, including hidden files, and in human-readable formatDefault: a value for argument when no explicit argument is given (where it makes sense, like ls)
Up arrow and down arrow scroll through command history.
Very important!
<Ctrl-R> - reverse history search<Tab> - completion: never have to type the whole thing<Alt-.> - inserts last argument from the historyPath: /<dir1>/<dir2>/<dir3>/foo.bar
for example: /home/ilya/src
Absoute path:
/, aka root. Location relative to filesystem root.Relative path:
/. Location relative to the curent directory.pwd - print working directorycd <dir> - change directorymkdir [options] <dir> - make new directoryls [options] [<dir>...] - list directoryDownloads (or Documents) directoryDocuments directory using different optionsThese are huge time savers. But they are nothing more than aliases.
~ - current user's home directory (/home/<user>/ or /Users/<user>/ on Mac OS). - current directory.. - parent directory- - last directory (although in most contexts it means stdin)Other useful things:
pushd <dir> - pushes directory <dir> into stackpopd - pops the last pushed directory from the stackThese two can be thought about as "remember for later" and "recall the last remembered" commands.
Files that have x bit set in their permission are executable. These can be executed by typing their name at the prompt:
$ /home/vasyapupkin/myprog1
$ ./myprog1
$ /bin/myprog1
or they can be executed by typing just their name at the prompt if their location is listed in PATH variable:
$ echo $PATH
$ myprog1
if unsure, use which programm to find the executable (if it exists!):
$ which python
cat will output its arguments to stdoutless will do the same but in a humane way (pagination, search, scrolling, etc)man displays a help page for a given commandhead outputs n first lines in a filetail outputs n last lines in a fileCreate a directory:
mkdir <dirname>
usual path rules apply (see absolute vs relative paths). Fancy switch -p:
mkdir -p path/to/my/new/dir
mkdir -p path/to/{one,two,three}
by default, cp only copies regular files and skips directories. To copy directories use -r (recursively) option:
cp -r <source_dir> <destination>
but watch for that trailing slash:
cp -r <source>/ <destination>
behaves differently. Why?
Globbing works as one would expect:
cp <source>/*.txt <destination>
will copy all files ending with .txt to <destination>
But what if we want to move a bunch of stuff? Sure this should work:
mv <source>/*.txt <destination>
but it doesn't. WTF?
Cheating way: install rename programm.
Won't work if you don't have admin rights though.
Proper way: loop
for f in *.txt; do mv $f <destintaion>; done
HINT: for a dry run replace mv with echo
CAUTION: There is no undelete. If you delete a file, it's gone forever!
Delete (remove) a file(s):
rm <file>
Delete a directory:
rm -r <directory>
grep stands for Global Regular ExPression. Regular expressions regex is an advanced and powerful way to match patterns.
grep can be thought of as a very versatile and efficient filter that can be configured to pass through only results you want.
| (aka pipe) - sends the output of the left program to the input of right programtee - same as pipe but at the same time saves the output of the left command into a file> - redirects the output of the programm to a file (overwriting the file if it exists)>> - same as > but appends to the file if it existswget - loads of options and protocols supported. Read manpages for all options.
Let's use it to download E.coli .gff file from NCBI (http://www.ncbi.nlm.nih.gov/genome/167):
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.gff.gz
and make sure it's where you expect it to be:
ls -lah *.gff.gz
and download some more stuff:
wget ngs.nudlerlab.info/master.zip
wget ngs.nudlerlab.info/BJ-HSR1.pe.fastq.gz
In [2]:
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_genomic.gff.gz
In [4]:
mv GCF_000005845.2_ASM584v2_genomic.gff.gz ../data
ls -lah ../data | grep gff
In [5]:
zcat ../data/GCF_000005845.2_ASM584v2_genomic.gff.gz | head
Most NGS data formats are text based and, therefore, are highly compressable. For instance, gzipped .fastq file can take 10-20% of the original space.
gzip - compresses the filegunzip - uncompresses the fileBy default both gzip and gunzip delete the original. To keep original file use zcat or -c flag for gzip/gunzip
It's a perfect usecase for pipes, so let's dig right in. Let's have a look what's inside the .gff file we've just downloaded:
zcat GCF_000005845.2_ASM584v2_genomic.gff.gz | less
How can we modify the above to show only beginning of the file? End of the file?
tar is for working with compressed directories
sort - self explanatory. Sorts the input in variety of ways. Really useful when chaining several programs using pipes.
uniq - outputs unique items from the input stream. Can count occurences of each item. Again, really shines when used with other programs.
tr - translates or deletes characters from the input stream. Doesn't sound like much but is a real time-saver when building pipelines and workflows.
wc - word count. Self-explanatory and you get the idea, useful to compose "compound" commands from simple programs.
Again, to get the full list of available options use man <program> command.
Coming back to .gff file. Let's see how we can build a nice little summary of E.coli genomic features (genes, CDS, and so on).
For starters:
zcat GCF_000005845.2_ASM584v2_genomic.gff.gz | less
good, but what's up with all those lines starting with #? Those are comments and we want to get rid of them.
grep to the rescue:
zcat GCF_000005845.2_ASM584v2_genomic.gff.gz | grep -v ^# | less
Better! So we are left with tab-delimited file (a relative of .csv really). Now we see we're interested in the 3rd column. Let's split the line on tabs and take the third field:
zcat GCF_000005845.2_ASM584v2_genomic.gff.gz | grep -v ^# | cut -f 3 | less
Ugly! But what if we sort the values and count unique items?
zcat GCF_000005845.2_ASM584v2_genomic.gff.gz | grep -v ^# | cut -f 3 | sort | uniq -c
And there you have it!
In UNIX, access to files is controlled via permissions.
There are three levels of permissions:
u userg groupo othersPermissions:
r read permissionw write permission (also create or delete)x eXecute permission (directories must have x permission set in order to be able to cd into them!!!)ls -l command will output lines starting with the permissions part.
By default, only file's owner (and root) has access to it.
Relevant commands:
chown - change the owner (must have permission to do so!)chgrp - change file's groupchmod - change permission(s)